A novel artificial intelligence network to assess the prognosis of gastrointestinal cancer to immunotherapy based on genetic mutation features

Background Immune checkpoint inhibitors (ICIs) have revolutionized gastrointestinal cancer treatment, yet the absence of reliable biomarkers hampers precise patient response prediction. Methods We developed and validated a genomic mutation signature (GMS) employing a novel artificial intelligence network to forecast the prognosis of gastrointestinal cancer patients undergoing ICIs therapy. Subsequently, we explored the underlying immune landscapes across different subtypes using multiomics data. Finally, UMI-77 was pinpointed through the analysis of drug sensitization data from the Genomics of Drug Sensitivity in Cancer (GDSC) database. The sensitivity of UMI-77 to the AGS and MKN45 cell lines was evaluated using the cell counting kit-8 (CCK8) assay and the plate clone formation assay. Results Using the artificial intelligence network, we developed the GMS that independently predicts the prognosis of gastrointestinal cancer patients. The GMS demonstrated consistent performance across three public cohorts and exhibited high sensitivity and specificity for 6, 12, and 24-month overall survival (OS) in receiver operating characteristic (ROC) curve analysis. It outperformed conventional clinical and molecular features. Low-risk samples showed a higher presence of cytolytic immune cells and enhanced immunogenic potential compared to high-risk samples. Additionally, we identified the small molecule compound UMI-77. The half-maximal inhibitory concentration (IC50) of UMI-77 was inversely related to the GMS. Notably, the AGS cell line, classified as high-risk, displayed greater sensitivity to UMI-77, whereas the MKN45 cell line, classified as low-risk, showed less sensitivity. Conclusion The GMS developed here can reliably predict survival benefit for gastrointestinal cancer patients on ICIs therapy.


Introduction
Gastrointestinal cancers constitute a significant health challenge worldwide, accounting for 26% of all cancer diagnoses and 35% of cancer-related fatalities (1).Immune checkpoint inhibitors (ICIs) have emerged as a potentially effective therapeutic strategy for a variety of cancer types, including those of the gastrointestinal cancer (2,3).However, the response rate to ICIs is limited, varying from 10-20% across different tumor types (3,4).Consequently, the development of biomarkers capable of accurately identifying patients who are likely to benefit from ICIs therapy is of paramount importance.
Microsatellite instability (MSI), a genetic indicator of tumor responsiveness to ICIs, stands as the sole validated biomarker in clinical trials for gastrointestinal cancers (5,6).However, MSI-high tumors are relatively uncommon, representing only 0-5% of all metastatic gastrointestinal cancer cases (7).Programmed death ligand-1 (PD-L1) expression is another commonly assessed biomarker for the application of ICIs, but its predictive value is inconsistent across different trials due to heterogeneity and variability of expression and detection (8,9).Another promising biomarker under investigation is the tumor mutation burden (TMB), which has demonstrated a correlation with response to ICIs in recent research (10).However, TMB is not a reliable biomarker for gastrointestinal cancer (11).Not all mutations have the same immunogenic impact, and some mutations, such as CDKN2A, ARID1A, ARID1B, ARID2, ERBB4, and ZFHX3, may modulate the outcomes of ICIs treatment in positive or negative ways (12)(13)(14)(15).Besides, In the context of gastrointestinal tumors, certain genetic mutations are closely linked to the effectiveness of immunotherapy.Mutations in the AKT1 and CDH1 genes have been associated with primary resistance to ICIs (16).These insights highlight the importance of gene mutations in predicting responses to immunotherapy and tailoring personalized treatment approaches for patients with gastrointestinal cancers.TMB scoring systems do not account for the differential effects of these mutations, limiting their predictive value for ICIs (17).To overcome this limitation, some studies have suggested refining the TMB algorithm (18) or constructing gene mutation-based signatures to improve the survival prediction of ICIs in gastrointestinal cancer (17,19,20).
Machine learning and deep learning are powerful tools for solving complex problems in medicine using large clinical data sets (20).These methods have demonstrated their achievements and efficiency in prediction and clustering tasks (21).By applying these novel technologies, we can explore the mechanisms of therapy resistance at different levels, such as transcriptional, epigenetic, and translational levels, and find more clues to improve the efficacy of ICIs (22-24).Thus, we developed a novel artificial intelligence network that integrated traditional regression algorithms, machine learning, and deep learning, comprising a total of 22 algorithms and 297 algorithm combinations, greatly surpassing the previous 101 algorithm combinations (25).This comprehensive approach allows us to more accurately analyze and predict the outcomes of immunotherapy for gastrointestinal tumors.
In this work, we used genomic mutation information to develop and validate an artificial intelligence network-based genomic mutation signature (GMS).This study may provide guidance for immunotherapy treatment decisions and improve the clinical outcomes of gastrointestinal cancer.

Designing research studies and collecting data
We present the overall study design in Figure 1.We collected 233 gastrointestinal cancer cases treated with ICIs from Memorial Sloan Kettering Cancer Center (MSK) as a training cohort to screen for mutations with prognostic potential and to construct a prognostic signature (11).We also obtained two independent validation cohorts of gastrointestinal cancers with ICIs treatment from public databases.The combined Janjigian and Pender cohort comprised 39 cases of metastatic chemotherapy-refractory esophagogastric cancer (26) and 9 cases of metastatic or advanced gastrointestinal cancer (27).The PUCH cohort consisted of 91 patients with gastrointestinal cancer (17).The patient enrollment criteria are as follows: (1) primary gastrointestinal cancers; (2) availability of gene mutation profiles and clinical annotations, including follow-up data; (3) receipt of at least one cycle of a CTLA-4 inhibitor, PD-1/PD-L1 inhibitor or combined treatment.Furthermore, we obtained somatic mutation data, mRNA expression profiles, and copy number variations (CNV) for a nonimmunotherapy gastrointestinal cancer cohort consisting of 184 cases of esophageal cancer, 439 cases of gastric cancer, and 380 cases of colorectal cancer from The Cancer Genome Atlas (TCGA) database.The genomic and clinical data for the MSK cohort, the Janjigian and Pender cohorts, and the PUCH cohort, are openly available and were downloaded from the following sources: MSK cohort (http:// www.cbioportal.org/study?id=tmb_mskcc_2018), Janjigian cohort (https://www.cbioportal.org/study/summary?id=egc_msk_2017),Pender cohort (http://clincancerres.aacrjournals.org/content/27/1/202.article-info), and PUCH cohort (https://www.bcgsc.ca/downloads/immunoPOG/).The data from the TCGA dataset are available for download at https://portal.gdc.cancer.gov/.

Analysis of mutation data and evaluation of clinical outcomes
Tumor tissues from the MSK and Janjigian cohorts were subjected to sequencing using the MSK-IMPACT sequencing technique, which involved either a 341-gene panel, a 410-gene panel, or a 468-gene panel.For the Pender cohort, whole-genome sequencing (WGS) was utilized for tumor tissue analysis, and whole-exome sequencing (WES) was employed for the PUCH cohort.The mutated gene status was assigned a value of 1, and the wild-type gene status was assigned a value of 0. The primary survival endpoint considered was overall survival (OS).The clinical response was assessed per the Response Evaluation Criteria in Solid Tumors version 1.1.Durable clinical benefit (DCB) met the criteria for complete response (CR), partial response (PR), or stable disease (SD) persisting for ≥6 months.Conversely, no durable benefit (NDB) was defined as progressive disease (PD) criteria or SD <6 months (28).

Artificial intelligence network-based signature generation
We constructed a novel artificial intelligence network based on 297 algorithm combinations, integrating 22 algorithms from traditional regression, machine learning, and deep learning.These algorithms included random survival forest (RSF), supervised principal components (SuperPC), oblique random survival forests (obliqueRSF), gradient boosting with component-wise linear models (GLMBoost), gradient boosting with regression trees (BlackBoost), stepwise Cox, recursive partitioning and regression trees (Rpart), parametric survival model (Survreg), Ranger, conditional inference trees (Ctree), least absolute shrinkage and selection operator (LASSO), partial least squares regression for Cox (plsRcox), survival support vector machine (survival-SVM), Ridge, elastic network (Enet), deephit survival neural network (DeepHit), deepsurv survival neural network (DeepSurv), cox-time survival neural network (CoxTime), extreme gradient boosting (XGBoost), Coxboost, CForest, and variable selection oriented LASSO bagging algorithm (VSOLassoBag).We developed the signature as follows: (1) Prognostic genes were identified via univariate Cox regression in the MSK cohort.(2) Initial signature discovery utilized an artificial intelligence network in the MSK cohort.(3) Further testing of the network occurred in two validation cohorts (Janjigian/ Pender and PUCH).(4) Harrell's concordance index (C-index) evaluated each model's performance across all cohorts.The model with the maximal average C-index across the test cohorts was deemed optimal based on its superior predictive ability.The source code and specific parameters of this artificial intelligence network can be found at the following GitHub repository: https://github.com/miaolab1998/AI_network/tree/main.

Functional annotation of the GMS
We collected immune modulators from a previous study (29).We used four algorithms to quantify immune infiltrating cells: the quanTIseq algorithm (30) of 11 immune cells, the estimating the proportions of immune and cancer cells (EPIC) algorithm ( 31) of eight immune cells, the Microenvironment Cell Populations-counter (MCPcounter) algorithm (32)of ten immune cells, and the Estimation of STromal and Immune cells in Malignant Tumours using Expression data (ESTIMATE) algorithm (33).We also acquired 29 classical immune signatures from the work of He et al. (34).The cytolytic activity scores (CYTs) were estimated using the geometric mean of GZMA and PRF1 (35).Employing the GSVA R package, which is grounded in the single-sample gene set enrichment analysis (ssGSEA) technique, we quantified the enrichment levels of the 29 immune signatures across each s a m p l e ( 3 6 ) .U t i l i z i n g t h e G SV A m e t h o d (3 6) and clusterprofiler (37) R packages, we executed gene set variation analysis (GSVA) and gene set enrichment analysis (GSEA) on the MSigDB database.We also used Metascape for enrichment analysis (38).

Calculation of immunogenomic indicators
We obtained immunogenomic indicators from the pan-cancer immune landscape study (29).In summary, they established the intertumoral heterogeneity (ITH) score to quantify the subclonal genomic fraction, reflecting tumor genome segments unaccounted for by the dominant clone.This was determined via ABSOLUTE, a tool modeling tumor alterations including subclonal and clonal components with varying ploidies.CNV burden metrics were n_segs, indicating segment count per sample, and frac_altered, denoting proportion of bases diverging from baseline ploidy.The aneuploidy score aggregated altered chromosomal arms.Additionally, T-cell receptor (TCR) and B-cell receptor (BCR) diversity indices like Shannon entropy and richness were calculated from cancer RNAseq data.

Oncogenic pathway enrichment scores
From the study by Sanchez-Vega et al (39), we obtained ten canonical oncogenic pathways that include 187 oncogenes.We applied the GSVA method, facilitated by the GSVA R package (36), to calculate the enrichment scores for these pathways in each sample.

Uncovering genomic mutational signatures
Employing the maftools R package, we conducted nonnegative matrix factorization (NMF) on a dataset of 96 trinucleotide context mutations from gastrointestinal cancer specimens, which were obtained from the TCGA.We then compared the resulting mutational landscape to the Catalogue of Somatic Mutations in Cancer (COSMIC), employing cosine similarity for the assessment.

Drug prediction
We retrieved data on tumor cell line sensitivity to potential drugs and mutations from the Genomics of Drug Sensitivity in Cancer (GDSC) database.The cell line sensitivity was assessed using the lower half maximal inhibitory concentration (IC50) values of the respective drugs.

Cell line culture
The human gastric cancer cell lines AGS and MKN45 were acquired from the Shanghai Institutes for Biological Sciences, part of the Chinese Academy of Sciences.MKN45 cells were grown in RPMI 1640 medium supplemented with 10% FBS and 1% penicillinstreptomycin.AGS cells were cultivated in Ham's F-12 medium with the same supplements.The cells were incubated at 37°C with 5% CO2.

CCK-8 detection
Cells were seeded into a 96-well plate at an optimal density of 5,000 cells per well.We treated the cells with different concentrations of UMI-77 and incubated them for 48 h and 72 h.We measured and recorded the absorbance value on the cell growth curve and calculated the IC50.

Colony formation assay
1000 untreated cells were cultured in each well of a six-well plate, either with UMI-77, DMSO, or without any treatment, for a period of 2 weeks.Following this, colony formation was analyzed.

Statistical analysis
Categorical data were examined with the chi-square test, and numerical data with the Wilcoxon test.Pearson test was employed for association analysis.Survival curves were generated with the Survival and survminer packages in R. Univariate and multivariate Cox regression analyses were performed to assess the GMS's clinical factor independence.Receiver operator characteristic curve (ROC) and area under the ROC curve (AUC) were used to determine the predictive sensitivity and specificity for survival or response.Statistical significance was defined as a P value below 0.05, unless stated otherwise.All analyses were conducted using R version 4.2.3.

Construction and valiation of the GMS
The characteristics of patients in these three immunotherapeutic cohorts are detailed in Supplementary Table 1.The training cohort consisted of 233 gastrointestinal cancer patients (esophagogastric cancer, N = 123; colorectal cancer, N = 110) from MSK who received ICIs.We identified 74 prognostic genes through univariate Cox analysis and selected seed genes with a mutation frequency greater than 3%.These genes were then subjected to our artificial intelligence network to construct a GMS.The optimal model, comprising a combination of VSOLassoBag and RSF, was determined based on its highest average C-index (C-index = 0.71) among the 297 algorithm combinations evaluated through 10-fold cross-validation (Figure 2A).The VSOLassoBag algorithm selected 23 genes based on curve elbow point detection (CEP) method and used them to construct the most reliable GMS by RSF (Figures 2B, C).The GMS score was determined for each participant and stratified them into high and low-risk groups per the training set (median GMS score = 16.65).The high-risk group had markedly inferior OS versus low-risk (all p < 0.05) across all cohorts (Figures 2D-F).In the MSK cohort, 6-month AUC = 0.785, 12-month AUC = 0.799, and 18-month AUC = 0.837 (Figure 2D).In the Janjigian&Pender cohort, 6-month AUC = 0.771, 12-month AUC = 0.823, and 18-month AUC = 0.829 (Figure 2E).In the PUCH cohort, 6-month AUC = 0.782, 12-month AUC = 0.699, and 18-month AUC = 0.697 (Figure 2F).The time-dependent ROC curves demonstrated the strong and consistent performance of the GMS across all cohorts.In the two test cohorts, a notable number of patients with DCB had low GMS scores (all p < 0.05).The ROC analyses in these cohorts suggested that the GMS could be a valuable predictive biomarker for immunotherapy clinical benefit, with AUCs of 0.786 and 0.643, respectively (Figures 2G, H).These findings suggest the GMS may act as a robust predictor of responses and outcomes for gastrointestinal cancer patients undergoing immunotherapy.

The strong predictive performance of GMS
Univariate and multivariate Cox regression analyses were conducted across all cohorts to evaluate GMS as an independent predictor of OS in immunotherapy patients.In the univariate and multivariate analyses, GMS emerged as a robust predictor, not affected by adjustments for age, gender, drug category, MSI, PDL-1, and TMB (Figures 3A-C), solidifying its predictive utility in prognosis.To compare the predictive superiority of GMS, we assessed it against common clinical traits and molecular features.GMS exhibited significantly higher accuracy compared to other variables, such as age, gender, drug type, the genomic mutation signature of immunotherapy for gastrointestinal tumors identified in previous studies (GIPS) (17), TMB, MSI, and PD-L1, across all three cohorts (Figures 3D-F).These results indicate that our GMS holds promise as a reliable surrogate for predicting the prognosis of gastrointestinal cancer patients receiving immunotherapy in clinical practice.

Potential biological peculiarities of the GMS
We examined the biological mechanisms of GMS in the TCGA dataset.We noted that the GMS displayed a negative correlation with numerous immune pathways, including graft-versus-host disease, natural killer cell-mediated cytotoxicity, cytokine-cytokine receptor interaction, antigen processing, asthma, allograft rejection, and autoimmune thyroid disease pathways (Figure 4A).Conversely, the GMS showed a positive correlation with several tumorigenic pathways, such as DNA replication, mismatch repair, manchette assembly, cytosine DNA methylation, meiotic telomere clustering, and cell cycle pathways (Figure 4A).Further analysis revealed significant differences in immunological and tumorigenic pathways between the high-and low-risk groups (Figure 4B).The genes with high expression in the low-risk group were enriched in immune activation and infiltration pathways (Figure 4C).GSEA using Kyoto Encyclopedia of Genes and Genomes (KEGG) terms showed the low-risk group had enrichment in NK cell cytotoxicity, Th17 cell differentiation, and influenza A, as anticipated (Figure 4D).In contrast, the high-risk group displayed enrichment in DNA replication and cell cycle pathways.These

Extrinsic immune landscapes of the GMS
We assessed the GMS as an indicator of immune status by analyzing its association with infiltration of immune cells and expression of immune checkpoints.Figures 5A, B show that the low-risk group had increased infiltration of immune cells and immune modulatory activity in the TCGA dataset.Comparison of the 29 immune signatures between groups revealed that the low-risk group had higher prevalence of immune cells including CD8+ T cells (p < 0.05) (Figure 5C).To determine if the risk groups corresponded to low and high infiltration cohorts, unsupervised clustering of the 29 immune signatures for TCGA Frontiers in Immunology frontiersin.orgpatients was performed.This identified two distinct immune patterns: high and low immune infiltration (Figure 5D).Notably, the low-risk group was more common in the high infiltration cluster (p < 0.05) (Figure 5E).Furthermore, low-risk tumors were linked to significantly higher CYT scores (p < 0.05) (Figure 5F).These results implied a relatively inflamed and immunostimulatory microenvironment, which may be amenable to immunotherapy (40).

Intrinsic immune landscapes of the GMS
To clarify the factors affecting tumor immunogenicity between the two risk groups, we initially examined the neoantigen load, TCR diversity, mutation rate, BCR diversity, CNV burden, aneuploidy, and intertumoral heterogeneity.Compared to the high-risk group (all p < 0.05), the low-risk group harbored a higher mutation rate and neoantigen burden alongside significantly greater BCR and TCR diversity (all p < 0.05) (Figure 6A).However, the high-risk group exhibited significantly higher aneuploidy and CNV burdens (all p < 0.05) (Figure 6A).This aligns with existing research associating tumor aneuploidy with dampened immunotherapy responses (41).Compared to the low-risk group, individuals in the high-risk group exhibited significantly greater intertumoral heterogeneity (p < 0.05) (Figure 6A).This finding aligns with the hypothesis that tumors, facing a diminished immune response, may evolve clonally, leading to increased heterogeneity.This suggests that the heightened immunogenicity in the low-risk group might trigger an extrinsic immune response.To further explore the underlying mutational processes, we profiled mutational signatures based on somatic mutation data in both groups.This analysis revealed two distinct mutagenic patterns within the TCGA cohort (Figure 6B).The low-risk group exhibited a higher prevalence of SBS6, a mutational signature associated with defective DNA mismatch repair (Figure 6C).We further analyzed oncogene enrichment in ten key pathways, revealing distinct patterns.Whereas the cell cycle and Wnt pathways were enriched in the high-risk group (potentially linked to immune exclusion) (42), the Notch, PI3K, RAS, TGF beta, and TP53 pathways showed higher activity in the low-risk group (Figure 6D).

Copy number features of the GMS
The high-risk and low-risk groups harbored vastly different chromosomal abnormalities (Figure 7A).Notably, the low-risk group, unlike the high-risk group (Figures 7B, C), exhibited focal amplifications of immune genes, including PD-L1 (9p24.1)and PD-L2 (9p24.1).While 625 amplified genes were shared between the groups, the high-risk and low-risk groups harbored 373 and 1597 unique amplified genes, respectively.We further analyzed these amplified genes using Gene Ontology (GO) biological processes (Figure 7D).The GO enrichment analysis revealed a different pattern in the low-risk group (Figure 7D), including five immune-related processes focused on cell proliferation (mononuclear, lymphocyte, and leukocyte) and adaptive immunity through immunoglobulin superfamily domain recombination.Notably, no such immune pathways enrichment was observed in the high-risk group (Figure 7D).Intriguingly, PD-L1 and PD-L2, key players in immune modulation, reside within the 9p24.1 amplification peak unique to the low-risk group, suggesting their potential contribution to the observed enhanced immune response.Consistent with this, mRNA expression of PD-L1 and PD-L2 mirrored the CNV pattern, with their levels being significantly higher in the low-risk group (Figure 7D), highlighting the influence of tumor copy number variations on immune infiltration patterns.

Identification of small molecule drugs negatively associated with GMS
Based on the GDSC database, we identified that UMI-77, luminespib, lapatinib, and sapitinib exhibited the lowest p-values in the correlation test between GMS score and IC50, with UMI-77 having the smallest p-value (p < 0.05) (Figure 8A).We inferred that UMI-77 could be more effective for high-risk patients.To test this hypothesis, we measured the GMS of two cell lines in our laboratory (GMS score of AGS: 17.91; GMS score of MKN45: 4.43) and compared their sensitivity to UMI-77.The IC50 of UMI-77 for AGS and MKN45 was 8mM and 125mM, respectively (Figure 8B).A plate clone formation assay confirmed that AGS was more sensitive to UMI-77 (Figure 8C).

Discussion
A genomic classifier named GMS, consisting of 23 genes, was developed and validated.It was derived from an artificial intelligence network aimed at enhancing the prediction of ICIs therapy efficacy in gastrointestinal cancer patients.The selection of the most efficient model involved utilizing a combination of VSOLassoBag and RSF methods, which displayed the highest average C-index in the test cohorts.The GMS had a prognostic value independent of other factors and showed consistent performance in the validation cohorts.ROC analysis also demonstrated that the GMS had high sensitivity and specificity in predicting 6/12/24 months OS and clinical response.The GMS exhibited a significantly superior level of predictive accuracy in comparison to both clinical attributes (e.g., sex) and molecular characteristics (e.g., MSI, TMB, and PD-L1 expression).This indicates the considerable potential for enhanced clinical translation and utilization of the GMS.
Leveraging the comprehensive data of the TCGA cohort, we delved into the diverse responses of cancers to immunotherapy treatment.The low-risk group stood out for its dense immune cell infiltration, rigorously supported by multiple algorithms.This internal immunological terrain was additionally fortified by potent immunogenic features: elevated mutation rates and a substantial neoantigen burden.In contrast to the high-risk group, the low-risk group also exhibited increased expression of immune checkpoint proteins such as PD-L1, PD-1, and CTLA-4, which could contribute to a more favorable response to ICIs therapy.The activated antitumor immunity, elevated PD-L1, PD-L2 and CTLA-4 expression, and heightened tumor immunogenicity likely explain why the low-risk group benefits from ICI therapy compared to their high-risk counterparts.
Our research offers the following novel contributions and practical implications.Firstly, we have developed an artificial intelligence network that comprises 297 algorithm combinations.This integration encompasses 22 algorithms, drawn from traditional regression, machine learning, and deep learning methodologies.This network featured a diverse and comprehensive set of algorithms, and exhibited superior predictive performance than previous studies (17,25).Moreover, the optimal combination was VSOLassoBag and RSF, which was not considered in the prior study (25).The dimensionality of the variables was further reduced by the additional algorithm combinations, making the GMS more simplified and feasible.Secondly, the creation of multibiomarker predictive models demands a thorough comprehension of the elements impacting the dependability and precision of high-throughput assays in clinical scenarios.The variability in biomarker measurements, particularly those that is technical (platform-dependent), is a critical concern.A number of mRNA-based signatures have been developed to forecast clinical efficacy for patients receiving ICI therapy, including the T cell-inflamed gene-expression profile (GEP), which comprises an 18-gene panel (43).The evaluation of mRNA expression is carried out through relative quantification by normalizing it to reference genes (44).The risk scoring and threshold values of mRNA signatures may not be directly applicable for validation with diverse measurement data types.In this study, we have identified specific gene mutations to forecast the clinical effectiveness of ICIs.Consequently, the GMS is resilient to Exploration of potential intrinsic immune response and escape landscapes in the high-risk and low-risk groups.technical variations, even when different platforms are employed across various centers.Thirdly, in clinical practice, the GMS aids in avoiding potential immune-related adverse effects for patients who are unlikely to respond, and it enables the early identification of patients who may benefit from more effective therapies.Additionally, given that the average cost of a treatment regimen often exceeds $120,000 (45), implementing biomarker strategies that improve diagnostic precision can help prevent significant costs for treatments with limited expected benefits.In summary, since obtaining tumor specimens through targeted next-generation sequencing (NGS) of these genes is simpler and less costly compared to assessing TMB, which are complex and expensive in routine practice, the GMS with these refinement merits evaluation.Such an assessment could enhance diagnostic accuracy and costeffectiveness in clinic.
Utilizing the GDSC, we identified UMI-77, a small molecule drug that demonstrated the most significant p-value and a strong negative correlation with GMS.UMI-77 is an FDA-approved candidate drug for pancreatic cancer, known to inhibit cell proliferation and induce apoptosis in pancreatic cancer cells (46).Moreover, UMI-77 triggers mitophagy, a process that selectively eliminates damaged mitochondria, making it a potential therapeutic option for Alzheimer's disease (47,48) and glioma (49).Our observations revealed that the AGS cell line, categorized in the high-risk group, displayed greater sensitivity to UMI-77 than the MKN45 cell line, which belongs to the low-risk group.Based on these findings, we hypothesize that combining UMI-77 with ICIs may enhance the efficacy of ICIs in the high-risk group.However, this hypothesis necessitates further validation through in vivo experiments.

Limitations
Our research is not without limitations, which are important to acknowledge.Firstly, we did not have access to comprehensive clinical records for all patients, potentially introducing bias in the data analysis.Secondly, the inclusion of diverse gastrointestinal cancer types and the retrospective nature of the study may have introduced confounding factors.Thirdly, the abundance of immune cells and the expression of immune checkpoints should be substantiated through immunohistochemistry techniques.To address these limitations, further analysis and validation are needed through prospective studies involving a large cohort of gastrointestinal cancer patients with diverse ethnic backgrounds receiving ICIs therapy.Such studies would help strengthen the findings and implications of our research.

Conclusions
In summary, our GMS emerges as a promising biomarker for both prognosis and prediction of ICI treatment response in gastrointestinal cancer patients.This signature also presents an economical approach to pinpoint patients who may benefit from immunotherapy, a concept that should be further explored through prospective research.The GMS could significantly contribute to the refinement of personalized treatment plans and the enhancement of patient outcomes in gastrointestinal cancer immunotherapy.

2
FIGURE 2 Development and validation of an artificial intelligence network using 297 algorithm combinations.(A) Evaluation and C-index computation for 297 prediction models across all validation datasets.(B) Determination of the number of trees by minimizing error.(C) Variable importance of the top 23 genes determined using the random survival forest (RSF) algorithm.(D-F) Kaplan-Meier survival analysis (left) and receiver operating characteristic (ROC) (right) curves for overall survival (OS) in the MSK (D), Janjigian and Pender (E), and PUCH (F) cohorts.(G, H) The correlation between genomic mutation signature (GMS) and response (left), as well as the ROC of GMS predicting clinical response (right) in the Janjigian and Pender cohort (G), and PUCH cohort (H).

3
FIGURE 3 Univariate and multivariate Cox regression analyses of the GMS and other characteristics.(A) GMS subjected to univariate and multivariate Cox regression analyses in the MSK cohort.(B) GMS subjected to univariate and multivariate Cox regression analyses in the Janjigian and Pender cohort.(C) GMS subjected to univariate and multivariate Cox regression analyses in the PUCH cohort.(D) Comparison of GMS performance with other clinical and molecular variables for prognosis prediction in the MSK cohort.(E) Comparison of GMS performance with other clinical and molecular variables for prognosis prediction in the Janjigian and Pender cohort.(F) Comparison of GMS performance with other clinical and molecular variables for prognosis prediction in the PUCH cohort.

4
FIGURE 4 Biological peculiarities of the GMS in the TCGA dataset.(A) Outlining the biological characteristics of two groups based on GMS using MsigDB-based Gene Set Variation Analysis (GSVA) in the TCGA dataset.(B) T-distributed Stochastic Neighbor Embedding (t-SNE) plot to illustrate differences in pathway activity between two GMS groups based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) terms.(C) Metascape-based enrichment analysis of high expression genes in the low-risk group.(D) Gene Set Enrichment Analysis (GSEA) for GO and KEGG terms to investigate biological pathways associated with GMS in the TCGA dataset.**p < 0.01; ***p < 0.001.

5
FIGURE 5 Immune infiltrating characteristics of the GMS in the cohort from TCGA.(A) The relationship between the GMS and infiltrating immune cell populations.(B) The association between the GMS and immune modulatory factors.(C) The relationship between the GMS and 29 immune signatures score.(D) Unsupervised clustering based on 29 immune signatures.(E) The proportions of high and low immune infiltration were estimated in both the high-risk and low-risk groups.(F) A comparison of the cytolytic activity scores (CYTs) score was conducted betweenthe highrisk and low-risk groups.NS, no significant; *p < 0.05; **p < 0.01; ***p < 0.001.
(A) Comparison of immunogenomic markers between the high-risk and low-risk groups.(B) Analysis of mutational activities of two extracted mutational signatures.(C) Comparison of the SBS6 signature activity between high-risk and low-risk groups.(D) Comparison of enrichment scores for 10 oncogenic pathways between highrisk and low-risk groups.NS, no significant; *p < 0.05; **p < 0.01; ***p < 0.001.

7
FIGURE 7 Examination of Copy Number Alterations in High-Risk and Low-Risk Groups.(A) Displaying copy number profiles for the high-risk group (upper) and low-risk group (lower).(B) Elaborating on cytobands with focal amplifications (left) and deletions (right) peaks identified within the high-risk group.(C) Exploring cytobands with focal amplifications (left) and deletions (right) peaks detected in the low-risk group.(D) Circular plot showcasing the top 5 biological processes along with their corresponding enriched genes in the high-risk (left) and low-risk (right) groups.Additionally, comparing the mRNA expression of PD-L1 and PD-L2 between the high-risk and low-risk cohorts from TCGA (middle).**p < 0.01; ***p < 0.001.